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Abstract — We propose a new method for joint segmentation 
of monotonously growing or shrinking shapes in a time sequence 
of noisy images. The task of segmenting the image time series is 
expressed as an optimization problem using the spatio-temporal 
graph of pixels, in which we are able to impose the constraint 
of shape growth or of shrinkage by introducing monodirectional 
infinite links connecting pixels at the same spatial locations in 
successive image frames. The globally optimal solution is com- 
puted with a graph cut. The performance of the proposed method 
is validated on three applications: segmentation of melting sea 
ice floes and of growing burned areas from time series of 2D 
satellite images, and segmentation of a growing brain tumor 
from sequences of 3D medical scans. In the latter application, 
we impose an additional intersequences inclusion constraint by 
adding directed inflnite links between pixels of dependent image 
structures. 

Index Terms — Video segmentation, shape growth, graph cut, 
energy minimization, inflnite links. 

I. Introduction 

A utomatic segmentation of objects in videos is a diffi- 
cult endeavor in computer vision [1]. This task becomes 
even more challenging for image sequences with low signal- 
to-noise ratio or low contrast between intensities of spatially 
adjacent objects in the image scene. Such challenging data are 
recorded frequently, for instance, in satellite remote sensing or 
medical imaging. 

Image segmentation methods applied independently to each 
frame [2], [3] produce unstable results, while temporal coher- 
ence in video sequences yields a lot of information not 
available for a single image. There are two main categories 
of approaches for the spatio-temporal segmentation of image 
sequences. Causal, or feedforward, techniques consider only 
past data for segmenting each next frame [4], [5]. Omniscient 
approaches take advantage of both past and future data by 
analyzing the video as a 2D-hT = 3D spatio-temporal pixel 
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volume [6]-[8]. In this case, the segmentation of the entire 
image set supports each of the individual segmentations. 
Graph-based methods gained popularity among omniscient 
approaches, in particular those using hierarchical model [8], 
normalized cuts [6] or graph cuts [9]. These techniques do 
not impose any shape prior knowledge. It was proven that 
introducing shape priors into image segmentation, i.e. favoring 
segmentations similar in some sense to a given shape, allows 
to drastically improve segmentation of objects in the presence 
of strong noise and occlusions [10], [11]. However, impos- 
ing shape priors increases significantly both algorithmic and 
computational complexity of segmentation algorithms [12]. 
Schoenemann and Cremers proposed to compute minimal ratio 
cycles in a large product graph spanned by the image and 
the shape template for finding globally optimal segmentations, 
which are consistent both with edge information and with a 
shape prior [13], [14]. In order to speed up the algorithm, 
they implemented it in parallel on graphics hardware. This 
algorithm can be applied for segmenting shapes in time series 
in a causal way, so that the template determined for the last 
frame is matched to the next. However, it does not guarantee 
globally optimal solution over the whole temporal sequence. 

In this paper, we focus on segmenting shapes which only 
grow or shrink in time, from sequences of extremely noisy 
images. Examples of growing shapes are forest fires or 
expanding cities in satellite images and organ development in 
medical imaging. In the image sequences we consider, both 
foreground and background intensity distributions can vary 
significantly over time: foreground can be heavily occluded 
or undistinguishable from a part of the background, and data 
for some pixels can be missing (see Figs. 2 and 7 for exam- 
ples of such sequences). Most of previously-proposed spatio- 
temporal methods rely on coherence of foreground/background 
intensity distributions in successive image frames, and are 
therefore not suited for segmenting such noisy data sets. 
Few approaches have been specifically designed for spatio- 
temporal segmentation of magnetic resonance image (MRI) 
sequences with low signal-to-noise ratio [9], [15]. Applied to 
multi-temporal time series that show a monotonously growing 
or shrinking structure, however, these smoothing methods 
bias results towards the mean shape obtained from averaging 
consecutive segmentations and, hence, underestimate rapid 
growth or shrinkage events. 

To address this issue, we propose a new omniscient seg- 
mentation framework based on graph cuts for the joint seg- 
mentation of a multi-temporal image sequence. It introduces 
growth or shrinkage constraint in graph cuts by using directed 
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infinite links, which connect pixels at the same spatial loca- 
tions in successive image frames. By minimizing a submodular 
energy computed on the resulting spatio-temporal graph of the 
image sequence, the proposed method yields a globally opti- 
mal solution. Differently from the state-of-the-art omniscient 
techniques, it does not rely on the coherence of the intensity in 
time, but only on the coherence of the shape. To summarize, 
the main contribution we make in this paper is: 

1) a new framework for segmentation of 2D/3D image time 
series with the constraint of shape growth/shrinkage, 

2) in order to be able to segment extremely noisy/ 
low-contrast/incomplete data, 

3) in a very low computational time (see Fig. 5: linear time 
in the number of frames in practice). 

We validate the performance of the proposed framework on 
three applications with very noisy image sequences. The first 
one deals with the segmentation of multiyear sea ice floes in 
a set of satellite images acquired through different satellite 
sensors. The new method returns accurate melting profiles 
of sea ice, which is important for building climate models. 
The second application segments growing burned areas from 
time series of optical satellite images with missing data. The 
third application addresses the segmentation of brain tumors 
from longitudinal sets of multimodal MRI volumes, where 
we impose additional inter-modal inclusion constraints for the 
joint segmentation of different image structures (brain tissues). 

Note that the preliminary results related to this research 
have been presented in the conference paper [16]. The rest 
of the paper is organized as follows. In the next section, the 
proposed graph-cut-based segmentation framework is 
presented. Experimental results are discussed in Sec. Ill and 
conclusions are drawn in Sec. IV. 

II. Enforcing Shape Growth/Shrinkage 
IN Graph Cuts 

Graph cut is an optimization tool coming from graph theory, 
based on the rewriting of image segmentation problems as 
(s,t)-min-cuts in graphs, on the equivalence of (s,t)-min-cut 
and max- flow problems, and on the existence of efficient 
algorithms to solve the latter ones [17]-[19]. In practice in 
computer vision, it can be used to find the globally optimal 
binary segmentation of images where the segmentation crite- 
rion is related to a Markov Random Eield with submodular 
interaction terms, i.e. a criterion E of the form: 

E{L) = Y, Wi) + ( 1 ) 

pixels/ /~7 

where L is the binary labelling function to be found (L/ is 
the label of pixel /), individual potentials Vi are any binary 
real-valued functions measuring the disagreement between 
a prior probabilistic model and the observed data, i ~ j 
denotes a pair of neighboring pixels (any neighborhood system 
can be used), and Wij are any real- valued interaction terms 
between neighboring pixels expressing spatial coherency of 
labels, satisfying 

Wij(0,0) + Wijih 1) ^ Wij(0, 1) + WijihO). (2) 


A directed infinite link between two pixels expresses 
precisely the constraint that this pair of pixels cannot have 
the pair of labels (0,1), by assigning an infinite cost to such 
an interaction. The remaining possible pairs of labels are thus 
(0,0), (1,1) and (1,0), which means that either both pixels have 
the same label, or the order of labels is predefined (1 for first 
pixel and 0 for the second one). In the case of image binary 
segmentation, if 0 stands for the background and 1 for the 
foreground object, then this means that the second pixel may 
belong to the foreground only if the first one already does. 

A. Related Works 

Extensions of graph cuts to multi-label problems 
{i.e. multi-class segmentation) have been proposed but 
generally do no guarantee optimal solutions, except e.g. 
in the case of Ishikawa’s construction [20], which requires 
labels to be ordered and the interaction term to be a convex 
function of their differences, i.e. Wij(Li, Lj) = g{Lt — Lj) 
with g convex. This graph construction makes intensive 
use of infinite links to constrain the min-cut solutions to 
satisfy desired properties required to interpret them as image 
segmentation solutions. This was the source of inspiration for 
our work. 

A study related to shape constraints can be found in [21], 
where one image has to be segmented in several possibly- 
overlapping objects. Infinite links are used for imposing com- 
mon boundaries, inclusion or exclusion conditions between 
objects in a same single image. A similar approach [22] 
segments jointly two surfaces in a same volumic image, under 
the constraint that they should be separated by a given minimal 
margin. There is however no work related to shape growth or 
shrinkage in time series. 

Wolz et al. [9] applied graph cuts for simultaneous 
segmentation of serially acquired MRI volumes. They defined 
temporal edge weights as the intensity differences of voxels 
at the same spatial locations. The same smoothness constraint 
was applied both in space and time, and the segmentations 
at different timepoints were forced to be consistent in areas 
where a small intensity difference between the images exist. 
This type of temporal constraint is suboptimal in image series 
where intensity distributions of foreground and background 
vary significantly over time. To the best of our knowledge, 
our work is the first to use infinite links to enforce a temporal 
growth constraint, and we illustrate in Sec. Ill the advantage 
of the new method over previous approaches such as [9] . 

B. Growth/Shrinkage Constraint 

Given a sequence of images I{t) preliminarily aligned, 
shape growth can be easily expressed as the property that the 
foreground object cannot lose any pixel when time advances. 
Otherwise said, if a pixel belongs to the foreground object 
at time t\, then it belongs also to the foreground object 
for all times t 2 > t\. Equivalently, and simpler: a pair of 
pixels {{x , y , t) , {x , y , t -\- \)), sharing the same location and 
immediately successive in time, cannot have the pair of labels 
(1,0), with the same binary segmentation notations as above. 
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Image grid at time t 


Image grid at time t 



from all nodes 
Independent segmentation of T frames 


Sequence 1 Directed infinite links Sequence 2 



Joint segmentation with growth enforcement 


Fig. I. (a) Enforcing shape growth in an image sequence, (b) Segmenting jointly two sequences SI and S2, by enforcing the foreground of SI to contain 
the foreground of S2, with directed infinite links from SI towards S2, between all pixels of coordinates (x, y, t, and (x, y, t, S 2 ). 


This can be enforced by setting monodirectional infinite links 
from all pixels to their immediate predecessor in time. 

Given T images 7(f), with t G [1, T], and as many 
associated submodular segmentation criteria E\ we transform 
the problem of segmenting independently each image 7(f) 
according to its criterion into a joint segmentation of all 
images together, by enforcing the shape growth constraint with 
directed infinite links (see Fig. 1(a)). Thus, instead of applying 
graph cut T times independently to planar grids of the size 
of the images IT x 7f, we apply graph cut once to a 3D grid 
IT X 77 X r, consisting of the same nodes and edges, but with 
additional monodirectional infinite links in time. The criterion 
to be minimized is then E = E^ under the constraint of 
shape growth: 

E{L) = Y. 

pixels i i^j 

t 

Since the problem is binary and submodular, the solution 
found by graph cut is globally optimal. 

Manifestly, one can enforce shape shrinkage instead of 
shape growth, by reversing the direction of the infinite 
links. Another straightforward extension, needed in Sec. III-C, 
consists in applying this approach to the case of sequences of 
3D images. The directed infinite links are then set for all pairs 
of voxels of the form ((x, y, z, f), (x, y, z, f — 1)) to enforce 
3D shape growth. 

In some applications, it may happen that growth (or shrink- 
age) is only very probable, but not with probability 1, i.e. 
growth should be considered as a probable hint but should not 
be enforced strictly at all locations at all times. In that case, 
one may replace directed infinite links by directed finite links: 
the weights of these links will encourage growth (more or less 
strongly depending on the weight), but sufficiently disagreeing 
potentials Vi may make the shape locally shrink instead. Thus, 
shrinkage would be discouraged but not forbidden. 

C. Inter-Sequences Inclusion Constraint 

It is also possible to segment jointly several image 
sequences I{s){t) with the constraint that the foreground 
object in some sequences should be included in the foreground 


object of some other sequences. This can be done similarly by 
considering together the graphs associated to all sequences, 
and, for each inclusion constraint, by adding directed infinite 
links between pixels of the desired sequences and S 2 , 
sharing same location and time: such links from {s\,x,y,t) 
to (s 2 ,x,y,t) for all X, y, r will force the foreground object in 
sequence s\ to contain the one of sequence S 2 (see Fig. 1(b)). 
An example of such application is given in Sec. III-C, where 
image sequences correspond to four different MRI modalities, 
aligned both in space and time. 

Naturally, instead of imposing an inclusion constraint over 
the whole time span and the whole image space, it is possible 
to specify spatio-temporal domains of constraints, for instance 
to express that the inclusion property between two sequences 
has to be satisfied inside a pre-defined region and/or during a 
pre-defined time span [t\, t 2 ] only, by adding directed infinite 
links in these sets only. 

D. Weighting Erames by Reliability 

As presented earlier, enforcing shape growth explicitly is 
particularly important when facing noisy sequences of images, 
as a joint segmentation incorporates information from different 
frames. With our approach, a fully noisy frame in the middle 
of several good-quality frames will be automatically ignored 
in practice, because statistically the (bad) potentials T/ at any 
pixel i in that frame will be neglectable with respect to all 
other (correct) potentials at the same pixel in other frames. 
Hence, the solution we bring will be strongly robust to any 
kind of noise, provided that the noise is not coherent in time 
during long time spans. 

The quantity of noise that the method can handle depends 
on the number of neighboring frames with good-quality infor- 
mation at the same spatial location. One way to increase 
even more the trade-off between admissible level of noise 
and quantity of good information consists in estimating the 
reliability of the information given, in each image or even 
at each pixel, and in weighting accordingly the associated 
energies : E = Wt E^ . For instance, if a strong level 

of noise is detected globally in one image I{t), one may 
multiply the corresponding energy E^ by a small reliability 
factor iDt < 1, in order to make it less influential than 
other frames. Such a reliability factor could be computed 
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for instance from preliminary image processing steps 
(for example, correlation between an image I{t) and its 
neighboring ones ; 0 < \t — t'\ ^ St]). 

E. Complexity 

The precise theoretical worst case complexity depends on 
the graph cut algorithm used, and is the same as for usual 
single-image graph cuts, but with T times more nodes and 
edges. Denoting by N the number of nodes and M the number 
of edges, this worst case complexity is O(NM^) for the 
Edmonds-Karp algorithm, O(N^M) for the Dinitz blocking 
flow algorithm, which goes down to 0(NMlog(N)) using 
dynamic trees. However, the computational time observed in 
practice is known to be much faster on typical image segmen- 
tation problems. We applied the binary graph-cut algorithm of 
Boykov and Kolmogorov [17], and we report a linear observed 
complexity with the total number of pixels T x W x if 
(see for instance Fig. 5 for experiments of section III- A). As a 
consequence, enforcing shape growth on a long sequence, or 
on complementary shorter bits of the same sequence, will take 
approximately the same time. 

In the case of long sequences of big images, the memory 
space required may exceed the capacities of a computer. This 
is however not an issue, as there exist graph cut implemen- 
tations for massive grids [23] meant for such cases, where 
all information is not stored in the memory at all times. This 
was not required for the experiments presented in this paper 
though. 

E Rewriting as a Multi-Label Problem 

We show now another point of view on sequence segmen- 
tation with growth constraint. The successive labels L/ {t) of a 
given pixel i over time might change only once, and only 
from 0 (background) to 1 (foreground object). Hence, this 
vector of labels Li(t) is of the form (0, 0, . . . , 0, 1, . . . , 1, 1) 
and can be represented by just the time index t/ of the first 1, 
i.e. the earliest time at which the pixel starts belonging to the 
object. This time r/ is in [1, T-1-1], with T-\-l meaning “never.” 
We have thus transformed a binary optimization problem on 
a sequence of images with shape growth constraint into a 
multi-label problem defined on one single image, without any 
constraint. This new problem can be expressed in the Markov 
Random Field form (1) with 

Vi ( Ti ) := X + X y/(l) 

t<Ti t^Ti 

and 

Wij(Ti,Tj) := Y. 

t <min(r/ , tj ) r/ < Xj 

Tj < Ti t ^max(r; , xj ) 

where (V/) and {W- ■) define the energy E^ at time t, and 
where sets of summation may be empty (note that any time t 
appears in exactly one summation set only). The submodular- 
ity of the binary interaction terms in each frame implies 


the submodularity of the multilabel interaction term W, i.e. 

n) + ^ Wij{Tu 12) 

for all labels satisfying ri ^ and Z 2 ^ z^ (proof in 
Appendix). Thus, this energy can be minimized globally effi- 
ciently with now standard techniques (see [24]). Note that in 
the particular case where interaction terms do not depend 
on t, the interaction term W of this multi-label energy above 
can be rewritten as a convex function g of (t/ — zj), and then 
Ishikawa’s construction [20] can be applied. It turns out that 
the graph built this way is precisely the graph that we built in 
our initial binary multi-frame problem. Our initial formulation 
is however more flexible, in that interaction terms can depend 
on t, and more natural, in that inclusion constraints can easily 
be enforced in spatial or/and time subregions only, while this 
would not be expressible with the multi-label formulation. 

HI. Experimental Results 

We applied the proposed method to three different applica- 
tions, in order to validate its use to enforce shape shrinkage 
or growth, for 2D or 3D image sequences: 

(Application 1) We impose shrinkage constraint for a 
2D sequence, to segment a melting multiyear ice from a time 
series of satellite measurements. 

(Application 2) We enforce shape growth for a 2D sequence, 
to segment burned areas from optical satellite images. 

(Application 3) We use growth constraint for 3D image 
sequences, to segment growing brain tumors from multimodal 
MRI volumes. 

The performance of our framework with monodirectional 
links, [Mono=const], is compared with other graph-cut-based 
methods: 

• [w/o]\ Graph cut with no temporal links, i.e. independent 
segmentation of each frame. 

• [Eeedforward]: After segmenting the first frame with 
graph cut approach, foreground/background pixels are 
marked as seeds with infinite unary costs in the next frame 
for enforcing shape growth/shrinkage. 

• [Bi= const]: Smoothing by introducing bidirectional tem- 
poral links, i.e. links in both directions (from t to t 1 
and from r -h 1 to t), with a constant weight (finite or 
infinite). 

• [Bi=variable]: As proposed in [9], bidirectional temporal 
links are computed based on intensity differences between 
pixels in successive image frames, i.e. in the same way 
as spatial links. 

For comparison between the methods we used the Dice 
score [25], D = (2|M H M|)/(|M| -h |M|), where M and M 
are manually and automatically segmented foreground regions, 
respectively. 

A. Application 1: Melting Sea Ice in Satellite Images 

The melting of sea ice is correlated to the increases in sea 
surface temperature and to associated climatic changes. Thus 
it is very important to monitor sea ice evolution and to develop 
methods for automated analysis of satellite measurements. 
Previous works on ice floe segmentation attempted to extract 
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(c) (d) 


Fig. 2. (a, b) Reprojected MODIS images captured on the 244th and 252th 

days, respectively, (c, d) Corresponding AMSR-E images. 


temporal information by using ice percentages, area and shape 
parameters of the ice floes at the previous time moments 
as prior information to segment the floe at the next time 
moment [5], [26]. These feed-forward approaches were unable 
to accurately estimate melting ice profiles because of the low 
signal-to-noise ratio and of the lack of contrast in satellite data, 
producing area estimations that are noisy in time, while a 
multiyear ice floe can actually only melt in the summer 
period. 

We aimed at segmenting a multiyear ice floe from a 
45-day sequence (summer period from the 227th to 271th 
day of 2008) of measurements over the polar regions by 
two Aqua satellite sensors: Advanced Microwave Scanning 
Radiometer - Earth Observing System (AMSR-E, 89 GHz, 
6.25 km spatial resolution) and Moderate-Resolution Imaging 
Spectroradiometer (MODIS, band 1, 0.620 — 0.670 //m, 250 m 
spatial resolution). Eig. 2 shows MODIS and AMSR-E images 
at two time moments from the considered data set. The floe 
was tracked from the AMSR-E data, where multiyear ice has 
a low microwave emissivity (dark area in Eig. 2), and is in 
this way distinguishable from clouds and younger ice which 
has a higher emissivity (white area in Eig. 2). However, the 
low spatial resolution of these data does not allow to quantify 
the ice floes areas accurately. In accordance with the tracking 
measurements, a time series of T = 75 MODIS and upscaled 
AMSR-E images with the ice floe was built, with spatial 
dimensions of 800 x 800 pixels. We denote each MODIS 
image by /^ and each upscaled AMSR-E image smoothed 
by Gaussian filter by r = 1, . . . , T. 

The objective is to compute T segmentation maps V , where 
each pixel (v, y) has label = 1 if it belongs to the floe 

at time t, and 0 otherwise. In order to apply the proposed 
method with a shrinkage constraint to the selected time series, 
the images must be aligned, so that the property that the floe 
in the image is included in the floe of the image at the 
previous time moment P can be expressed directly in terms 
of pixel locations. 


Eor this purpose, we estimated a reliable region of the fore- 
ground {i.e., a region which can be considered with high 
probability as a part of the floe), Rp, and a reliable region 
of the background, Rp, from the AMSR-E images, where 
a multiyear ice floe is darker than water, young ice and 
clouds. Erom the AMSR-E tracking measurements, we derived 
a foreground seed point and an approximate area of the floe 
for each time moment t. We then grew a reliable region of 
the floe R/7 in each A^ from the foreground seed point, until 
it reached approximately half of the size of the floe. Another 
region, Rp, was grown in A^ from the same foreground seed 
point, but until an area half larger than the approximate floe 
size. The complementary region of pixels not in Rp, was 
considered with high probability as part of the background. 

We then computed the histograms of the intensities P of the 
floe, p^(I\F), and of the background, respectively, 

and a map of floe probabilities as 


p\F\I) 

P\B) 


p\l\F)P\F) 

p\l\F)pt{F)Fp^{l\B)P\By 
P — ■ ^ ^ (G 


( 4 ) 

P\B). 


The quantities above depend on the pixel location. The 
images P were aligned by exhaustive searching over rigid 
motions (rotations and translations) to maximize the correla- 
tion between maps of foreground probabilities at the current 
and previous moments. This search is largely affordable given 
the low frame rate. We computed potentials and interaction 
terms between neighboring pixels as: 


Vl{\) = -ln{p\F\ip, 
Wlj = P exp 


Vl(0) = -ln[P(B\I)l 
2a^ 


( 5 ) 

( 6 ) 


where <7 is a standard deviation of is a parameter that 
controls the importance of the spatial interaction energy term. 
We found experimentally that setting y = 2 yields robust 
results. The proposed method was applied with monodirec- 
tional temporal links to enforce floe shrinkage, as described 
in Sec. II. We performed several experiments, with different 
values of the constant m standing for the temporal link 
weight, from 0.25 to oo. The results [Mono = 0.25... oo] 
are compared with those obtained with other graph-cut-based 
approaches (listed in the beginning of Sec. Ill) in Table I 
and in Eig. 3-4. Both graph-cut with no temporal links and 
feedforward approaches show the worst performances, and 
prove to be not well suited for segmenting such noisy data 
sets. When a feedforward method encounters a frame with the 
part of the floe obscured by clouds and thus undistinguishable 
from the background, it segments only the visible part of the 
foreground, and then is trapped in a non- sense segmentation 
for the rest of future times. The method using gradient-based 
temporal links [Bi = variable] [9] also yields poor segmen- 
tation accuracies, because it is sensitive to both noise and 
variation of foreground/background intensities in consecutive 
frames. 
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TABLE I 

Results for Application 1 . Mean and Standard Deviations of the Dice Scores for the Proposed Method 
[Mono = oo] AND Graph-Cut-Based Approaches Used for Comparison 


Method 

Feedforward 

w/o 

Bi = variable 

Bi = 16 

Mono = oo 

Dice score 

.554 ± .128 

.933 ± .099 

.958 ± .048 

.978 ± .007 

.980 ± .007 


Day Image Feedforward w/o Bi = variable Bi = 4 Bi = 16 Mono = oo 



Fig. 3. (From top to bottom) Aligned images and segmentation contours (red) for four time moments (days 230, 233, 235 and 267, respectively) computed by 
the graph-cut methods: (From left to right) Aligned image, [Feedforward], [w/o], [Bi = variable], [Bi = 4], [Bi = 16], proposed method with monodirectional 
infinite links. Manual segmentation is shown in green. The rightmost part of the white area in the third row is not part of the object, but another ice floe who 
temporarily collided. 


We explain in Fig. 3-4 the advantage of using monodirec- 
tional infinite links versus bidirectional links in the temporal 
dimension. Bidirectional edges with low values of w enforce 
only smoothness of variation of the contour in time, and yield 
segmentation errors in the case of low foreground/background 
contrast. For example, in the third image of Fig. 3 (day 235), 
the floe of interest collided temporarily with another ice floe. 
When using a weak smoothness constraint (see segmentation 
contour [Bi = 47), the small encountered floe collided with 
the floe of interest during a certain number of consecutive 
frames would be considered as a part of the foreground. 
Enforcing more smoothness in space-time to avoid this has 
the undesirable effect of smoothing the foreground shape, so 
that the segmented foreground area is lower (underestimated) 
than the ground- truth for the first frames, and higher (over- 
estimated) for the last frames (see results [Bi = 167 in 
Fig. 3 and-4(b)). With the increase of w, the estimated 
foreground tends to the constant shape for all time moments, 
and the Dice score decreases. 

When the proposed shrinkage constraint is used instead, 
the segmentation accuracy increases with w, and w = oo 
yields results with monotonous shrinkage of the shape area 
(see Fig. 4(a)). Moreover, the proposed method copes well 


with rapid shrinkage events, without underestimating pre- 
ceding images, or overestimating the event itself at onset. 
Another advantage of using monodirectional infinite links is 
that there are no additional parameters to quantify temporal 
coherency. 

Fig. 5 depicts the computational time for the proposed 
graph-cut-based optimization as a function of the number 
of frames. The total computational time grows linearly with 
the number of frames, and is approximately twice the time 
that would be taken by the independent segmentation of each 
frame. 

B. Application 2: Growing Burned Areas in 
Satellite Observations 

Biomass burning has a significant impact on the Earth’s 
climate system. Satellite remote sensors acquire data for the 
continuous monitoring of burned areas at both regional and 
global scales. Thus, there is a need to develop methods for 
automated fire mapping. While most of the existing techniques 
for mapping burned areas analyze temporal evolution of each 
pixel in an image scene [27], recent studies have proved the 
advantage of considering spatial contextual classification for 
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(a) 



Fig. 4. Results for Application I. (a) Mean and standard deviation for the 
dice score as a function of the temporal link’s weight, when using mono- 
(green) and bidirectional (red) temporal links, (b) Area of a multiyear ice floe 
as a function of time, computed by using mono- and bidirectional links with 
different weights. 



Fig. 5. Computational time for the proposed segmentation of the ice floe set 
as a function of the number of frames. 

accurate fire classification [28], [29]. Both works [28], [29] 
map fires from MODIS data by detecting and classifying 
persistent changes in a daily vegetation-index time series. 
Giglio et al [29] exploit the closest fixed pixel’s neighbor- 
hood to refine fire classification. Lewis [28] segments and 
analyzes change detection maps between two consecutive 
time moments. Manual post-processing is needed to correct 
classification errors, which are a consequence of either a 
cloud cover, or low contrast between burned and unburned 
areas. 

In our study we analyzed two time series of Terra MODIS 
atmospherically-corrected Level 2G daily surface reflectance 
measurements over the tropical savannas in the Northern 
Australia (“MOD09GA” product, tile hSlvlO), each of the 
data sets being acquired during forty days of the dry season 
in September-October (days 244-283) of 2011 and 2013, 
respectively. Wildfires in this region of Australia are frequent 
and extensive. We used MODIS band 5 (1.240 jim) 500-m 
land surface reflectance data as they provide the highest 
burned-unburned separability and are largely insensitive to 
smoke aerosols [27]. Each time series comprised a set of 



Fig. 6. Flowchart of the segmentation algorithm applied for Application 2. 

r = 40 images with spatial dimensions W x H = 
400 X 400 pixels. Fig. 7(a) shows three images from each 
of the considered sets, where black pixels denote missing 
data (MODIS does not provide 100% daytime coverage of 
the terrestrial surface every day). 

We used MODIS Collection 5.1 Direct Broadcast Monthly 
Burned Area Product (MCD64A1, see Fig. 7(b)) [29] for 
learning and testing the proposed method, i.e. for computing 
an initial histogram of burned areas and comparing fire maps, 
respectively. The MCD64A1 product contains fire classifi- 
cation maps, where each pixel is associated with either an 
estimated day of burn, or an unburned flag, or an unmapped 
flag due to insufficient data. These maps are computed by 
applying the approach from [29] on two 500-m MODIS 
channels coupled with 1-km MODIS active fire observations. 

We segmented each of the considered image time series 
by applying the following iterative procedure 
(the flowchart is shown in Fig. 6): 

Initialization: ^ := 0. The initial training mask of burned 
areas is built using MCD64A1 product, by selecting the 
pixels burned during the days {t\ — D,t\ — This mask can 
also be created based on ground observations of the considered 
area on the day {t\ — 1). 

1. The training mask of unburned areas is constructed 
by dilating R^ with a disk of radius r [30] and selecting 
the complementary of the resulting image. 

2. For a subset of r images f = [t\-\-kT\t\-\-{k-\-\)T' —Y\, 
intensity histograms of the MODIS band 5 for burned 
p\I\B) and unburned p\I\U) areas are computed, 
using the masks R^ and R]^ , respectively. If the data for 
some pixels is missing, these pixels are not considered 
when computing histograms. 
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Fig. 7. Two time series of MODIS images acquired in 2011 and 2013 (left and right, respectively), (a) MODIS band 5 images for three time moments: days 
251, 265 and 279. Black pixels denote missing data, (b) Maps from the MCD64A1 product of areas burned during the days 213-251, 213-265 and 213-279, 
respectively (white pixels = burned areas), (c-d) Segmentation maps for images (a) computed by: (c) the proposed method with growth constraint; (d) [w/o] 
method with no temporal constraints. 


3. Forthe images ri + (^+ 1)7' — 1], potentials are 

computed, assuming equal priors p\B) = p^{U) = 1/2: 


y/(i) 

= -ln[p\B\lj)] = 

y/(0) 

= -ln[p^(U\ll)] = 


—In 


—In 


p'(l!\B) 

pt(il\B) + pt(lj\U) 


, (7) 


P^(l!\U) 
lp^(lj\B) + p^(ll\U)J 


( 8 ) 


If data 1 1 is missing for pixel i at time we set 
y/(l) = y/(0) = 0 (no prior). Interaction terms are 
computed using Eq. 6. 

4. The graph-cut optimization is applied on a joint graph 
of the images [^i, -h -h 1)7' — 1], yielding -h 1)7' 
segmentation maps. 

5. If the whole set of 7 images is segmented, exit the 
algorithm. Otherwise: k := ^ + 1. The segmentation map 

pti+kT'-3 

is used as the new training mask of burned 
areas . We do not use the segmentation result of the 
last frame -h k7'), because extreme frames benefit 
from less information from neighboring frames, and are 
therefore more subject to segmentation errors. Go to 
step 1. 

We applied the described algorithm with the parame- 
ters empirically set as 7) = 31, r = 20, = 2 and 

7' = 20, and with different temporal regularizations, i.e., 
[w/o], [Mono = 0.25 . . . oo] and [Bi = 0.25 . . .ooj. Neither 
[Feedforward] nor [Bi = variable] methods are suited for 
segmenting image sequences with missing data. Fig. 8 (a, c) 
gives the resulting dice scores for both data sets. We find that 


a weak temporal regularization (both bi-/monodirectional) out- 
performs a segmentation without temporal constraint. Increas- 
ing the bidirectional temporal regularization towards high 
values, however, decreases the performance. On the opposite, 
introducing monodirectional infinite links to impose shape 
growth yields the most accurate results. As can be seen from 
Fig. 7 and 8(b, d), the proposed approach achieves comparable 
results with the method [29] by using only one MODIS 
channel and no post-processing, while [29] used two MODIS 
bands coupled with 1-km MODIS active fire observations and 
post-processing. Furthermore, the new method copes better 
with the missing or noisy data thanks to the introduced 
spatio-temporal graph. 


C. Application 3: Growing Tumor in 3D Medical Scans 

Glioma is the most frequent primary tumor of the brain. 
The tumor is known to grow steadily, and lesions are eval- 
uated with respect to volume change in different magnetic 
resonance image (MRI) modalities. In our experiment we 
evaluated a set of 760 multimodal image volumes - each 
comprising T1 MRI, contrast-enhanced T1 MRI (Tic), T2 
MRI, and T2 FLAIR MRI - acquired from ten patients ini- 
tially diagnosed with low grade glioma. The time series have 
3-14 time points, with 3-6 months in between any two acqui- 
sitions. All image volumes were rigidly registered and three 
2D slices intersecting with the tumor center were manually 
annotated through an expert in every volume, representing 
an approximate truth. Full 3D segmentations for images of 
each individual time point were obtained using a generative 
model for multimodal brain tumor segmentation [31]. This 
algorithm models the lesion with a latent atlas class [15] 
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Fig. 8. Results for Application 2, for two MODIS time series acquired in 2011 (a-b) and 2013 (c-d), respectively, (a, c) Mean and standard deviation for 
the dice score as a function of the temporal link’s weight, when using mono- (green) and bidirectional (red) temporal links, (b, d) Burned area as a function 
of time, when using no temporal links (blue), monodirectional infinite links (green) and MCD64AI product (red). 


amending the tissue atlas of the standard EM segmenter 
[32], [33]. We applied it to each multimodal data set of each 
time point in an independent fashion. The segmentation model 
delineates the lesion individually in each modality. It assumes 
that changes of the core (visible in Tic) will occur within the 
larger edema regions (visible in T2 or FLAIR) and, hence, to 
only have class transitions from healthy to edema and from 
edema to core. As the tumor grows steadily, we can assume 
that negative volume changes stem from imaging artifacts, 
such as local intensity changes, a common problem in MRI. 
To this end we model the tumor to be either stable, in this 
case regularizing the segmentation along time and suppressing 
noise, or to expand in volume between any two time points. 

We identified the foreground label F with tumor {edema 
and core) and background B with healthy tissue. Then the 
potential V^^(Lj’^) of label at voxel /, time point t, 
and imaging sequence 5* is equal to V^^(O) = ps^{F\P^^) 
and y/’^(l) = = 1 — The tumor 

probabilities were calculated from image volumes I using the 
generative model [31], and we identified tumor subclasses 
with p{F\F^^^^^) for core, and p{F\F^^^^^) with edema. 
We modeled the 3D spatial constraints through a 26 neigh- 
borhood {M) linking the central voxel with all its immediate 
neighbors. Interaction terms W-'j{L^f\LY) between neigh- 
boring voxels in each sequence 5' G [T1,T2] are computed 
from the channel- specific intensity differences as 

^ — ) ) 

with p = 0.5, a{p,q) = 1 /distance (p, ^), atot = 

Z<?sV(pixeip)«(/’’^) and A = 1/3 (max - min r’'). 
We impose growth constraint in 3D -ft as explained in 


Sec. II-B, and inclusion constraints as in Sec. II-C: the fore- 
grounds in T1 and Tic modalities are required to be included 
in the one of T2, which is included in the one of FLAIR. 

In our test we first segment the images from each time 
point independently, and calculate Dice scores averaged 
over all images of a time series as an estimate of the 
baseline performance. Then we test different regularizations 
in time, i.e., [w/o], [Mono] and [Bi], calculate Dice scores, 
and compare them against the baseline results from the 
unregularized segmentations. Fig. 9 shows results for two 
exemplary time series, and Fig. 10(a) reports differences 
between Dice scores of regularized segmentations and baseline 
segmentations. In this experiment “weak” regularizations refer 
to small regularization parameters {m 1), while “strong” 
regularizations refer to lu ^ 1, representing the infinite 
mono- and bi-directional links. Fig. 10(a) shows results for 
all ten time series. As for the previous two applications, both 
bi-/monodirectional temporal regularization with low values 
of w yields better accuracies when compared to the results 
without temporal links. Enforcing more smoothness with 
bidirectional links decreases segmentation accuracies, while 
introducing monodirectional “growth” regularization through 
infinite links improves performance (Fig. 10(a)). The volume- 
time graphs (Fig. 10(b)) of the segmented tumor structures 
show as very regular pattern (red), even being smoother than 
the manual segmentation (green). Moreover, the /6>g(volume)- 
time graph (Fig. 10(b), right) shows the exponential growth of 
the tumor core (dotted lines) that is associated with this disease 
[34], for this patient indicating a rapid tumor progression 
starting at about day 500. Longitudinal image segmentation, 
as obtained for the ten time series in our test, can be further 
analyzed, for diagnosis and treatment monitoring, e.g., through 
algorithms estimating the speed of the tumor outlines under 


3838 


IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 9, SEPTEMBER 2014 



Fig. 9. Two time series of T2 and FLAIR MR image volumes, each of which acquired about 3-6 months apart. The left case is rapidly growing between 
the second and fourth scene, the right case displays intensity modifications in the last scene, leading to a suboptimal performance of the initial multimodal 
segmentation (yellow) [31]. The proposed multi-temporal segmentation with growth constraints (green) delineates areas similar to the manual evaluation 
(magenta), being more robust against intensity variations of the MR images. It does not smooth out outlines of rapidly growing tumors as conventional 
bi-directional temporal constraints would do. 




w/o inf_bi strong_bi weak_bi weak_mono strong_mono inf_mono 

Type of regularization 


(a) 



Full 3D volume (log scale) 



(b) 


Fig. 10. Results for Application 3. (a) Boxplots report changes in the average 
segmentation performance of the ten image sequences when testing different 
regularization approaches. (The box indicated quartiles, the whiskers outliers.). 
Using a strong monodirectional regularization acting as a growth prior yields 
the best results, (b) Volume-time plot for a patient with 14 observations 
(cmp. video in supplement). Solid lines indicate edema, dashed indicate 
tumor core that starts growing with constant rate at around day 500. The 
segmentation with growth constraint (red) returns results similar to the manual 
segmentation (green). Segmentations obtained by evaluating image volumes of 
each time point individually (blue) show significant variation, even obscuring 
the overall trend. 


globally-optimal spatio-temporal segmentation satisfying that 
constraint. The main idea was to introduce monodirectional 
infinite links between pixels at the same spatial locations in 
successive image frames, which prohibit a shape to shrink or 
grow over time, and then to perform a graph cut optimization 
on the constructed graph. The limitation of the proposed 
method is that it can be applied only to a time series of images 
on the same scale and perfectly aligned with respect to the 
foreground object, so that in the case of shape growth the 
foreground at the moment (^ + 1) contains all the foreground 
pixels at the previous moment f, sharing the same spatial 
locations. Thus, if the foreground object moves over time, 
images must be aligned before applying the new graph-cut- 
based technique. We also demonstrated the possibility to 
impose inter- sequences object inclusion constraints by adding 
directed infinite links to the joint graph associated to all 
sequences. 

We validated the performance of the proposed approach for 
the segmentation of growing (burned areas) and shrinking (ice 
floe) shapes from 2D time series of satellite images, and for 
the segmentation of growing 3D tumor volumes from MR 
sequences. The new method proved to be robust to important 
noise and low contrast, and to cope well with missing data. 
Moreover, it showed linear complexity in practice, so that 
globally optimal shape-consistent segmentations of image time 
series are obtained in a matter of seconds. 

We plan to apply the proposed framework to other appli- 
cations, such as the segmentation of wrinkles during the 
cosmetological treatment. We are also currently extending our 
work on tumor segmentation to the case of several medically- 
motivated classes Q, expressing tumor evolution stages, that 
satisfy an inclusion constraint C/ C C/+i,V/, considering 
additional relations between tumor substructures. 


anatomical constraints [35]. Extension of the current 5D seg- 
mentation could integrate this speed estimation, or extend the 
multimodal tumor segmentation [31] for longitudinal data sets. 

IV. Conclusions and Future Work 

In this work, we addressed the problem of shape segmen- 
tation in 2D and 3D sequences of very noisy/low-contrast 
images, where shapes monotonously grow or shrink in time. 
In order to enforce shape growth or shrinkage, we pro- 
posed a new graph-cut-based method for computing the 


Appendix 

Submodularity proof 


This proof is better viewed in colors. 

Wij can be written as a sum of terms over time t, which is, 
if Ti ^ Tj\ 


t 


VF/^ (0, 0) if t < mm{xi,Tj) 
0) if Ti < Tj 
Wj j(l, 1) if ? > max(r,, zj) 


(A) 

(B) 
(D) 
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and 

■ w'/0,0) 

if t < min(r/, Zj) 

(A) 

Wij(n,Tj) = Y. 


if Zj ^ t < Zi 

(C) 

t 


if t ^ max(r/, r^) 

(D) 

Otherwise. 





We can thus represent Wij(zi,Tj) by the sequence of 
cases A, B, C or D chosen for each t\ 


Finally, in case (3), the terms of the submodularity condition 
write: 


n Tj t' t[ 

— ^ ^ ^ ^ 1 
AAABBBBDDDDDDDDDDD 

AAAAAAAAAAACCCDDDD 

Wi,j{n,Tj) : 
Wijiriri) : 

Wijirirj) : 

W^An,r'^) ■■ 

A A A A A A A CCCCCCCDDDD 
AAABBBBBBBBDDDDDDD 


Wij in = AAAAA bbbbbbb dddddd 

t 

when Ti ^ Tj, and otherwise : 

'T- 'T- 


Wij (Ti,Tj) = Y^ AAAAA CCCCCCC DDDDDD 

t 

Now, to represent the submodularity condition, let r/, zj, z-, z^j 
be any times, satisfying Zi ^ z[ and zj ^ r j . We will suppose 
moreover that t/ ^ Zj \ otherwise consider Wjj(zj, r/) instead 
and reverse names i and j . Three cases are possible: 

1) Ti ^ r/ ^ Zj ^ z'j 

2) Zi ^ Zj ^ z'. ^ rj 

3 ) Zi ^ Zj ^ z'j ^ z'. 

In case (1), the terms of the submodularity condition write: 


Wijin,Tj) : 
Wijir’,ri) : 

AAABBBBBBBBDDDDDDD 

AAAAAAABBBBBBBDDDD 

Wi,iiri,T;) : 

AAAAAAABBBBDDDDDDD 

AAABBBBBBBBBBBDDDD 


and the submodularity condition is satisfied if the sum of the 
two first rows is less than or equal to the sum of the two last 
rows. It turns out that in the particular case above, the sums are 
equal for each instant t, and consequently the submodularity 
condition is satisfied as being an equality. 

In case (2), the terms of the submodularity condition write: 


Wij{n,Tj) : 

AAABBBBDDDDDDDDDDD 
AAAAAAAAAAAB B B DDDD 

W^An,r') : 

AAAAAAACCCCDDDDDDD 

AAABBBBBBBBBBBDDDD 


This time, the two sums, at any instant t, are equal if t < Zj or 
t ^ Z-. For other values of t, that is to say when Zj ^ t < z-, 
the sum A +Z) is to be compared with i.e. the question is 

to compare W] ■ (0, 0) + W/ • (1, 1) with W] ■ (0, 1) + W/ • (1, 0). 
As is binary submodular by hypothesis, and by definition 
of binary submodularity in equation (2), we have A D ^ 
B C, and thus the sum of the two first rows is less than or 
equal to the sum of the two last ones, which means precisely 
Wijizi, Zj) + Wij(z;, z'j) ^ Wijizi, rp + Wij(z;, Zj), and 
thus the submodularity condition for E is checked. 


The two sums are equal for all instants t < Zj or t ^ Zj . When 
Zj ^ t < z'j, the sum A + Z) is to be compared with B -\- C, 
as previously, and consequently the submodularity condition 
for E is checked again. 

In all cases (1), (2) and (3), E is proven to be submodular, 
and this concludes the proof. 
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